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ABSTRACT 

The general context of this study concerns the post-processing of multiline spectropolarimetric observations of stars, and in particular 
these numerical analysis techniques aiming at the detection and the characterization of polarized signatures. Hereafter, using real 
observational data, we compare and clarify a number of points concerning various methods of analysis. Indeed, simple line addition, 
least-squares deconvolution and denoising by principal component analysis have been applied, and compared to each other, to po- 
larized stellar spectra available from the TBLegacy database of the Narval spectropolarimeter. Such a comparison between various 
approaches of distinct sophistication levels allows us to make a safe choice for the next implementation of on-line post-processing of 
our unique database for the stellar physics community. 
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1. Introduction 

The present study concerns the post-processing of multiline 
spectropolarimetric measurements, and in particular of stellar 
data. We focus hereafter on data collected, since 2006, with 
the Narval spectropolarimeter mounted at the 2-m aperture TBL 
telescope located at the summit of the Pic du Midi de Bigorre 
(France). We investigate, in particular, the capabilities of princi- 
pal component analysis (hereafter PCA) on observations made 
with Narval. 

PCA has been regularly used in solar spectropolarimetry 
during the last decade (see e.g., Rees et al. 2000 and Skumanich 
& Lopez Ariste 2002). Its main purpose was to provide an al- 
ternative way of inverting spectropolarimetric data, for the de- 
termination of the vector magnetic field present in various solar 
features, from sunspots to solar prominences (see e.g., Lopez 
Ariste & Casini 2002). 

Concerning stellar data, PCA-based denoising of spectral 
lines was first presented by Caroll et al. (2007). It was further 
tested on data taken with the SOFIN spectrograph at the NOT 
telescope. This procedure was mainly driven by the purpose 
of doing Zeeman-Doppler Imaging (hereafter ZDI; see Semel 
1989) from temporal sequences of individual spectral lines, in- 
stead of using pseudo-profiles such as the ones commonly com- 
puted by least-squares deconvolution (hereafter LSD; see Donati 
et al. 1997 and Kochukhov et al. 2010, for a recent review and 
discussion). More recently, Martinez Gonzalez et al. (2008) dis- 
cussed in details the capabilities of PCA denoising of solar and 
stellar spectropolarimetric data, using synthetic data. They also 
provided some comments concerning the relationship between 
PCA denoising, line addition and least-squares deconvolution. 
Later on, Ramirez Velez et al. (2010) proposed another PCA- 
based method, coupled to ZDI, which was applied to a very lim- 
ited set of observational data taken both at the AAT telescope 



with the SemelPol spectropolarimeter, and with Narval at the 
TBL. 

Hereafter we come back on some details of PCA denois- 
ing and analysis of observational spectropolarimetric data. We 
discuss further the practical capabilities of such an approach. 
Comparisons with LSD and the so-called (simple) line addition 
(hereafter SLA; Semel et al. 2009) methods are also discussed. 



2. The source of data 

We have been using Narval data available from the public 
database TBLegacy[J Narval is a state-of-the-art spectropo- 
larimeter operating in the 0.38-1 pm spectral domain, with a 
spectral resolution of 65 000 in its polarimetric mode. It is an im- 
proved copy, adapted to the 2-m TBL telescope, of the Espadons 
spectropolarimeter, in operations since 2004 at the 3.6-m aper- 
ture CFHT telescope (see Donati et al. 2006 for further technical 
details). 

The TBLegacy database is operational since 2007. It is at the 
present time the largest on-line archive of high-resolution polar- 
ization spectra. It hosts data which were taken at the 2-m TBL 
telescope since december 2006. So far, more than 70 000 spectra 
have been made available, for more than 370 distinct targets all 
over the Hertzsprung -Russell diagram. More than 13 000 polar- 
ized spectra are also available, mostly for circular polarization 
(linear polarization data are very seldom still and amounts to a 
few hundreds spectra, but it is equally available). By default, the 
latter is the usual circular polarization V(A)/I C normalized to the 
local continuum intensity. 

At the present time, the TBLegacy database provides no 
more than Stokes / or V/I c spectra calibrated in wavelength. 
Stokes / data are either normalised to the local continuum or not. 



1 http://tblegacy.bagn.obs-mip.fr/ 
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Fig. 1. Comparison between LSD (full lines) and SLA (dashed 
lines) I/I c and V/I c pseudo-line profiles, for II Peg observations 
of August 2008. Stokes V profiles have been shifted by 1.1 so 
the largest amplitude lobe, for LSD, is about 0.1% of 7 C in that 
case. P\ profiles (dot-dashed) both for I/I c and V/I c resulting 
from PCA analysis of the data are also displayed for comparison 
purpose. 

In a next step, further post-processing of these spectra will be 
proposed on-line to users and the relevant software will be made 
fully available to the community. It will be the case for the sim- 
ple line addition and the least-squares deconvolution standard 
procedures that we shall be using in the present study, together 
with PCA denoising. 

3. Numerical procedures 

3.1. The matrix of observations 

Observations we get from TBLegacy are basically Stokes 1(A) 
or V(A). Each of them consist in a very large array of about 
200000 elements covering the whole spectral domain observ- 
able by Narval. The main task of building the matrix of obser- 
vations O is to split the multiline observations vs. wavelength 
into A^ bs elementary profiles, each of them centered at a given 
wavelength and projected onto a common velocity grid. Such a 
velocity grid is an a priori data that we adopt in our numerical 
procedure. The choice of such a grid of velocities depends on 
the spectral sampling of the original set of data - in the case of 
Narval data it is of the order of 1.8 km.s -1 , as well as on the 
target nature, for what concerns the velocity range to be consid- 
ered (typically between ±120 and +200 km.s -1 ). Practically we 
shall be dealing with N v velocity bins of the order of 10 2 , while 
IorV original data (sampled in wavelength), will be recast into 
A^ b s elementary v-sampled profiles, where N \, s is of the order of 
10 3 -10 4 , depending on the spectral type of the target. 

The transformation of the original data requires the help of 
a supplementary file, usually called "mask" and which consists 
in the list of all the wavelengths at rest, Ao, of the spectral lines 
expected to be present in the observations of a given spectral 
type of stars. For all of the cases discussed hereafter, we have 
used mask files widely used by the community and built from 
the VALD database (Piskunov et al. 1995; ressources for this 
study have been kindly provided to us by E. Alecian). In gen- 
eral, these mask files contain additional information about each 
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Fig. 2. Same as Fig. (1) but for e Eri observations of February 
2007. Stokes V profiles have been shifted by 1.1 and multiplied 
by 3, so the largest amplitude lobe, for LSD, is about 0.05% of 
I c in that case. 

spectral line, in particular their line depression, du and effective 
Lande factors, g h required by LSD (see next section). 

Therefore, given a proper mask and a velocity grid, it is quite 
easy and straightforward to transform 1(A) or V(A) data into A^ b s 
individual I(v) or V(v) profiles, in accordance with the Doppler- 
Fizeau effect and the well-known relationship 



where 6A — (A - Aq), which is computed from the original data 
(see also §2.1 in Ramirez Velez et al. 2010). 

This operation results into the construction of a (N bs,N v ) 
rectangular matrix of observations O which shall now be used 
in different ways. 

3.2. Simple line addition vs. LSD 

For a wealth of data in TBLegacy, down to polarized signatures 
V/I c of the order of 0.01%, the pseudo-profiles resulting from 
the simple line addition (or, to be more precise the unweighted, 
or arithmetic mean) of the N b s individual spectral lines of O are 
very meaningful, both from the standpoints of the detection and 
of the characterization (i.e., the proper determination of its shape 
and amplitudes) of the polarized signature carried by the multi- 
line, but noisy, observations. Moreover, SLA profiles are very 
similar to the one obtained from least-squares deconvolution. 
This was indeed mentioned and discussed in the very instruc- 
tive, but unfortunately overlooked at, recent article of Semel et 
al. (2009). Nevertheless, to the best of our knowledge, no direct 
comparisons between LSD and SLA profiles obtained with real 
data such as Narval' s, have been published yet. 

To remedy that, in Figs. (1) and (2), we display both LSD 
and SLA pseudo-profiles obtained directly by computing a sim- 
ple average of all the rows of the O matrix constructed from 
the same set of observations of the RS CVn star II Peg, made 
in August 2008. The LSD profiles for Stokes / have been com- 
puted using weights ajj = dj normalized to the arithmetic mean 
of the considered central line depressions d t . Those for Stokes V 
were computed for weights a>v = giAojdj normalized to the arith- 
metic mean of the w/s. Also, no line depth cut-off criterium 
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was adopted there (provided that depressions are, originally, 
greater than or equal to 10% of the continuum). Considerations 
and recommendations about the issue of LSD weights defini- 
tion (and especially their normalisation) and line depth cut-off 
criterium can be found in Kochukhov et al. (2010). The latter 
revealed some indiscipline in the community of LSD users and 
subsequent articles still fail, unfortunately, in providing details 
about the exact procedure which was applied to data - see e.g., 
Kochukhov et al. (201 1) or Donati et al. (201 1). To conclude on 
these points, we again recommend this community to read care- 
fully Semel et al. (2009) and, especially, their §2.3 dedicated to 
the statistical properties of (LSD) weights. 

For II Peg, we considered about 6600 wavelengths in the 
mask, covering a 400-1000 nm range, using VALD data for 
a r e ff. = 5 000 K, a surface gravity of logg=3.0 cgs and 
solar abundances. Concerning this choice of stellar parame- 
ters, Berdyugina et al. (1998) determined r e fj. = 4600 K and 
logg=3.2 cgs. However T e fi as high as 5 250 K are still re- 
ported by VizieR. As can be seen in Fig. (1), respective shapes 
of 7// c and V/I c are well recovered, both by LSD and SLA, and 
they are indeed very similar. Amplitudes of I/I c and V/I c LSD 
pseudo-profiles appear systematically slightly larger than SLA 
ones. However, this is not going to impair significantly any fur- 
ther determination of the mean line-of-sight magnetic field usu- 
ally made, assuming the weak-field regime of the Zeeman effect, 
using the centre of gravity method (Rees & Semel 1979, and ref- 
erences therein). 

We noticed similar effects, displayed in Fig. (2), using ob- 
servations of the K2V star s Eri made on February 2007 
with Narval. For that case, and after inspection of all VizieR 
ressources, we adopted a T e g = 5 000 K and a surface gravity of 
logg=4.5 cgs (and solar abundances) mask (see also Koleva & 
Vazdekis 2012).; 

Using a test version of TBLegacy currently under develop- 
ment] we have been able to verify indeed how similar SLA and 
LSD signatures are, from the analysis of many other cases in- 
cluding for hotter magnetic stars than the ones discussed in this 
article. 



3.3. Principal component analysis 

Following Martinez Gonzalez et al. (2008), we built the cross- 
product matrix, C = O t O, and computed its eigenvalues s, and 
eigenvectors c, (hereafter, eigenprofiles). Hereafter we shall call 
Oj(v) the observation made at wavelength index j, and we shall 
omit the dependance in v of each of these individual profiles. No 
physical assumption about the line formation process or the ori- 
gin of the polarization signals are required for the PCA analysis 
we have carried-out. 

As demonstrated by Martinez Gonzalez et al. (2008) with 
their Figs. (1), without any noise (or a limited amount of it - this 
is the case for Stokes I data from TBLegacy, for instance), the 
examination of the sequence of eigenvalues s t of C shows that 
a few of them will dominate, sometimes by orders of magni- 
tudes as compared to the smallest ones. However, for significant 
noise levels, as it will be the case hereafter for Stokes V data, 
the sequence of s, is in general very slowly decreasing - see e.g., 
Fig. (6). 
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Fig. 3. Example of PCA denoising: the original noisy signal 
Oj(v) (dashed line) for the 612.2 nm line of Cai, is displayed, 
together with its projection on the eigenprofile of matrix C asso- 
ciated with the largest eigenvalue, P ^ (full thick line). The latter 
profile already bears a shape very similar to the SLA (or LSD) 
pseudo-profiles obtained with the whole set of observations, and 
displayed in Fig. (1). 
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Fig. 4. Comparison between the original map of the matrix of 
observations O (left) and the map of the Pj t \ (right), for II Peg 
observations of August 2008. The efficiency of the PCA denois- 
ing is obvious at almost all wavelengths. 



Even though the sequence of eigenvalues s, is very slowly 
decreasing for most of Stokes V data from TBLegacy, we first 
tried PCA denoising by computing 



J* 



(Oj-e k )e k 



(2) 



2 The (Python) software implemented for such an analysis will be 
made public, although it is already available upon request to the author. 



for k-l. For the wavelength index j corresponding to the strong 
magnetically sensitive 612.2 nm line of Cai , Fig. (3) shows the 
efficiency of PCA denoising using only the projection onto the 
eigenprofile e\ associated to the largest eigenvalue s\. In that 
case, it is quite obvious that, for that level of signal-to-noise ra- 
tio the gain provided by the PCA denoising procedure is sig- 
nificant enough, and potentially allows for the detection of a 
meaningful polarized signature buried into noise. Moreover, it 
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is easy to notice that the single Pj t \ denoised profile displayed in 
Fig. (3) already bears a shape very similar to the SLA (or LSD) 
pseudo-profiles obtained from the whole set of observations, as 
displayed in Fig. (1). 

The efficiency of PCA denoising at all wavelengths can also 
be seen in Fig. (4) where we displayed images of the observa- 
tions matrix O (left) in comparison with the matrix of the P/.i's 
(right). In that case, clear polarized signatures emerge almost at 
all observed wavelengths. It also opens the possibility of a di- 
rect exploitation of single line data, instead of a pseudo-profile 
combining all of the multiline signatures. The same is true for s 
Eri data, for instance, even though its SLA (or LSD) signature is 
significantly smaller than II Peg's. 



4. Comparison with SLA and LSD 

We have shown with the previous examples how PCA denoising 
can be efficient on real stellar data. It can be very useful for de- 
tection purpose but could it offer an alternative to LSD or SLA 
methods? 

The case of s Eri is interesting in the sense that its polariza- 
tion signature is less complex but of much less amplitude than 
the one of II Peg. Both LSD and SLA pseudo-line profiles, that 

is: 




velocity index 



Fig. 5. Map of the (Pf, - O) from circular polarization observa- 
tions of s Eri made on February 2007. It takes about &=50-60 
eigenprofiles for recovering the mean O profile. The color scale 
on the right side of the image also indicates, in that case, that 
Pk=\ can be a factor of 2 smaller in amplitude than the SLA 
mean profile whose amplitude is about 0.04%. 
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O = — YOj(v) 



(3) 



show a clear antisymmetric V/I c profile with amplitudes of both 
negative and positive lobe about 0.04-0.05%, and spanning over 
a Av » 30km.s~' spectral range. But this signature, recovered 
with two distinct methods, is not fully recovered when we con- 
sider just the mean of the projection of the O/s onto eigenprofile 
e\ only. The resulting mean profile is still about a factor of 2 less 
the amplitude of O and the lobes are also wider than the ones of 
LSD or SLA pseudo-profiles - see again Fig. (2). Beyond the de- 
tection capability of PCA-based denoising, this opens the further 
question of the proper characterization of the "most common" 
polarization signal content of the multiline observations. 

In order to investigate on that point, we built a map displayed 
in Fig. (5), constructed from the successive differences between 



N, 



(4) 



obs 



and O. It is quite clear that about 50 eigenprofiles should be 
taken into account in order to recover, from a PCA analysis, a 
pseudo-line comparable to the SLA (i.e., O) or LSD ones. This 
result is in clear contradiction with the comments made in §5.1 
of Martinez Gonzalez et al. (2008) about Pi and LSD or SLA 
pseudo-profiles, using noisy but synthetic data. Indeed, PCA de- 
noising can be made equivalent to the line addition technique, 
as well as to least-squares deconvolution, but for the TBLegacy 
data we have been using in that study, at the price of considering 
a set of eigenprofiles and not the only one associated with the 
largest eigenvalue of C (and similar behaviours were noticed for 
II Peg and e Eri data). 

In order to understand this behaviour, it can be worthwhile 
analysing, in addition to the polarization data, the so-called 
"null" spectra, N(A), which comes along with standard Narval 
(and Espadons) data. It is indeed customary now in stellar spec- 
tropolarimetry to proceed with a double beam- exchange method 
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Fig. 6. Successive eigenvalues of the cross-correlation matri- 
ces computed respectively from the V (dotted lines) and the N 
(crossed lines) data of II Peg (discontinuous lines) and e Eri 
(continous lines) from TBLegacy. Eigenvalues for s Eri were 
magnified by a factor of 10. 



which consists in recording a sequence of 4 sub-exposures as- 
sociated to 2 distinct and opposite polarization states (see e.g., 
Semel & Li 1996). N profiles result from a combination of sub- 
exposures, similar to the one used for the extraction of the po- 
larization signal, but on the contrary removing any polarization 
signal of astrophysical origin. Its main usage is for the eventual 
detection of any spurious signal in the data which may corrupt 
the astrophysical signal. However, for clean observations (i.e., 
when N{A) is structureless), it basically contains noise, at the 
same level as the one which remains in the polarized spectra. 

The number of eigenprofiles to consider for the reconstructed 
Pk to be comparable to LSD or SLA pseudo-profiles is roughly 
given by the index at which the sequences of eigenvalues of V 
and N, respectively, do overlap. Figure (6) displays two sets of 
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eigenvalues, which overlap indeed for A: a; 50 in these case. II 
Peg data is represented by discontinuous lines, while s Eri data 
is represented by continous lines (note also that for this latter 
set of data, eigenvalues were magnified by a factor of 10). In 
both case, V-eigenvalues correspond to dotted lines while cross 
symbols are for /V-eigenvalues. The same kind of an empirical 
criteria was advanced by Martinez Gonzalez et al. (2008) during 
their discussion about respective PCA analysis of a "correlated" 
(synthetic) data set and another one of uncorrelated (Gaussian) 
noise. 

In summary, the simultaneous PCA analysis of V and N al- 
low for (a) the detection of a polarized signature in the data, if 
the condition 



is satisfied and (b) the characterization of a single representative 
signature, similar to LSD or SLA pseudo-profiles, which can be 
made considering projections of the original data on a number 
of eigenprofiles which will be given by this index at which the 
two sequences of eigenvalues s^ v) and do overlap. 

5. Intrinsinc dimension of the dataset 

We finally evaluate the intrinsic dimensionality of our main 
II Peg and e Eri data sets, following the analysis exposed in 
Asensio Ramos et al. (2007) and illustrated with synthetic data 
and solar spectropolarimetric observations. To this end, we com- 
puted maximum likelihood dimension estimators m for different 
values n of neighbours, for each of the profiles contained in the 
observations matrix O. 

We adopted the formula modified by MacKay & 
Ghahramani (2005^] after the initial work of Levina & 
Bickel (2005). Both for II Peg and e Eri data we have been 
analysing, values for m appear in a range of the order of 38-48, 
for n ranging from 3 to 75. This is quite consistent with our 
PCA analysis of Pk vs. O showing that our noisy data force us 
to consider more eigenprofiles than a priori expected, according 
to Martinez Gonzalez et al. (2008). 

6. Conclusion 

We have experimented different methods of analysis of multi- 
line polarized spectra of stars. We have shown, using real data, 
that the simple line addition technique (Semel et al. 2009) allows 
for the computation of pseudo-profiles very similar to the ones 
computed by least-squares deconvolution. It is also much sim- 
pler to implement and it requires less external input data, which 
makes it both simple and efficient, and therefore very suitable 
for the implementation of a standard post-processing tool for the 
TBLegacy database content. 

From our study, LSD does not show any clear advantage on 
SLA. Furthermore, its systematic use for stellar spectropolari- 
metric databases would require, for the sake of interoperability, 
the set-up of a specific protocol concerning the line depth cut-off 
criteria and the normalization of weights used both for Stokes / 
and V data processing. 

We have also applied PCA denoising to real (and noisy) ob- 
servational data, which proves indeed very efficient. We have 
shown that it can provide an alternative to SLA or LSD post- 
processing methods, for the characterization of the polarization 

3 http://www.inference.phy.cam.ac.uk/mackay/dimension/ - see also 
Eq. (5) in Asensio Ramos et al. (2007) 



content of the multiline observations, once the necessary number 
of eigenprofiles of the cross-product matrix of the observations 
have been carefully estimated. The latter can be derived from the 
combined PCA analysis of V and N data. Finally, and as well as 
for SLA, it is in principle equally applicable to all kind of po- 
larization signals, whatever is their physical origin or the kind 
of observed state of polarization, circular or linear, which was 
observed. 
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